Search CORE

Problèmes d'intégration morphologique d'emprunts d'origine anglaise en français

Author: Sagot Benoît
Walther Géraldine
Publication venue: HAL CCSD
Publication date: 05/10/2011
Field of study

International audienceNous proposons une étude morphologique de l'emprunt, en particulier verbal, d'origine anglaise en français. À partir de données extraites d'un corpus volumineux, nous étudions les procédés morphologiques d'intégration des nouvelles unités lexicales (sous leur forme graphémique) et les problèmes qu'ils posent notamment en termes d'instabilité orthographique ou de mécanismes dérivationnels. Cette étude fournit ainsi une première approche théorique du phénomène morphologique de l'emprunt. Elle devra ensuite servir de support théorique à un traitement automatique des emprunts

Enriching Morphological Lexica through Unsupervised Derivational Rule Acquisition

Author: Nicolas Lionel
Walther Géraldine
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

WoLeR 2011 is endorsed by FlaReNet, and supported by the Alpage team and the EDyLex French national grant (ANR-09-CORD-008).International audienceIn a morphological lexicon, each entry combines a lemma with a specific inflection class, often defined by a set of inflection rules. Therefore, such lexica usually give a satisfying account of inflectional operations. Derivational information, however, is usually badly covered. In this paper we introduce a novel approach for enriching morphological lexica with derivational links between entries and with new entries derived from existing ones and attested in large-scale corpora, without relying on prior knowledge of possible derivational processes. To achieve this goal, we adapt the unsupervised morphological rule acquisition tool MorphAcq (Nicolas et al., 2010) in a way allowing it to take into account an existing morphological lexicon developed in the Alexina framework (Sagot, 2010), such as the Lefff for French and the Leffe for Spanish. We apply this tool on large corpora, thus uncovering morphological rules that model derivational operations in these two lexica. We use these rules for generating derivation links between existing entries, as well as for deriving new entries from existing ones and adding those which are best attested in a large corpus. In addition to lexicon development and NLP applications that benefit from rich lexical data, such derivational information will be particularly valuable to linguists who rely on vast amounts of data to describe and analyse these specific morphological phenomena

CiteSeerX

HAL-UNICE

Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

Author: Sagot Benoît
Walther Géraldine
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

International audienceIn this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are designed to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling

Crossref

Problèmes d'intégration morphologique d'emprunts d'origine anglaise en français

Author: Sagot Benoît
Walther Géraldine
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

Développement de ressources pour le persan : le nouveau lexique morphologique PerLex 2 et l'étiqueteur morphosyntaxique MElt-fa

Author: Faghiri Pegah
Sagot Benoît
Samvelian Pollet
Walther Géraldine
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceDans cet article nous présentons une nouvelle version de PerLex, lexique morphologique du persan, une version corrigée et partiellement réannotée du corpus étiqueté BijanKhan (BijanKhan, 2004) et MEltfa, un nouvel étiqueteur morphosyntaxique librement disponible pour le persan. Après avoir développé une première version de PerLex (Sagot & Walther, 2010), nous en proposons donc ici une version améliorée. Outre une validation manuelle partielle, PerLex 2 repose désormais sur un inventaire de catégories linguistiquement motivé. Nous avons également développé une nouvelle version du corpus BijanKhan : cette nouvelle version contient des corrections significatives de la tokenisation ainsi qu'un réétiquetage à l'aide des nouvelles catégories. Cette nouvelle version du corpus a enfin été utilisée pour l'entraînement de MEltfa, notre étiqueteur morphosyntaxique pour le persan librement disponible, s'appuyant à la fois sur ce nouvel inventaire de catégories, sur PerLex 2 et sur le système d'étiquetage MElt (Denis & Sagot, 2009)

A new morphological lexicon and a POS tagger for the Persian Language

Author: Faghiri Pegah
Sagot Benoît
Samvelian Pollet
Walther Géraldine
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceIn (Sagot and Walther, 2010), the authors introduce an advanced tokenizer and a morpho- logical lexicon for the Persian language named PerLex. In this paper, we describe experiments dedicated to enriching this lexicon and using it for building a POS tagger for Persian